Consistent feature attribution for tree ensembles
نویسندگان
چکیده
It is critical in many applications to understand what features are important for a model, and why individual predictions were made. For tree ensemble methods these questions are usually answered by attributing importance values to input features, either globally or for a single prediction. Here we show that current feature attribution methods are inconsistent, which means changing the model to rely more on a given feature can actually decrease the importance assigned to that feature. To address this problem we develop fast exact solutions for SHAP (SHapley Additive exPlanation) values, which were recently shown to be the unique additive feature attribution method based on conditional expectations that is both consistent and locally accurate. We integrate these improvements into the latest version of XGBoost, demonstrate the inconsistencies of current methods, and show how using SHAP values results in significantly improved supervised clustering performance. Feature importance values are a key part of understanding widely used models such as gradient boosting trees and random forests, so improvements to them have broad practical implications.
منابع مشابه
Consistent Individualized Feature Attribution for Tree Ensembles
Interpreting predictions from tree ensemble methods such as gradient boosting machines and random forests is important, yet feature attribution for trees is often heuristic and not individualized for each prediction. Here we show that popular feature attribution methods are inconsistent, meaning they can lower a feature’s assigned importance when the true impact of that feature actually increas...
متن کاملسبک اسناد در بیماران مبتلا به همبودی اضطراب و افسردگی
Objective : The present study is investigating the attribution style in patients with the anxiety and depression comorbidity. Method: Subjects are 26 patients with major depression, 25 patients with generalized anxiety disorder, 17 patients with comorbidity of anxiety and depression, and 30 normal individuals. The aparatus used in the study for data collecting were Beck Depression Inventory, ...
متن کاملThe Utility of Randomness in Decision Tree Ensembles
The use of randomness in constructing decision tree ensembles has drawn much attention in the machine learning community. In general, ensembles introduce randomness to generate diverse trees and in turn they enhance ensembles’ predictive accuracy. Examples of such ensembles are Bagging, Random Forests and Random Decision Tree. In the past, most of the random tree ensembles inject various kinds ...
متن کاملپیش بینی راهبردهای تنظیم شناختی هیجان بر اساس باورهای فراشناخت و سبک های اسنادی در معتادان در حال ترک
Objective: The purpose of this study was to predict cognitive emotion regulation strategies based on metacognitive beliefs and attribution styles in the addicted people under abstinence period. Method: The statistical population of this consisted of the individuals who had referred to one of the addiction therapy centers in Ghods city in 2016 (under methadone therapy). One hundred and sixty-thr...
متن کاملAuthorship Attribution Based on Feature Set Subspacing Ensembles
Authorship attribution can assist the criminal investigation procedure as well as cybercrime analysis. This task can be viewed as a single-label multi-class text categorization problem. Given that the style of a text can be represented as mere word frequencies selected in a language-independent method, suitable machine learning techniques able to deal with high dimensional feature spaces and sp...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1706.06060 شماره
صفحات -
تاریخ انتشار 2017